Molecular classification of cancer: unsupervised self-organizing map analysis of gene expression microarray data.

نویسندگان

  • David G Covell
  • Anders Wallqvist
  • Alfred A Rabow
  • Narmada Thanki
چکیده

An unsupervised self-organizing map-based clustering strategy has been developed to classify tissue samples from an oligonucleotide microarray patient database. Our method is based on the likelihood that a test data vector may have a gene expression fingerprint that is shared by more than one tumor class and as such can identify datasets that cannot be unequivocally assigned to a single tumor class. Our self-organizing map analysis completely separated the tumor from the normal expression datasets. Within the 14 different tumor types, classification accuracies on the order of approximately 80% correct were achieved. Nearly perfect classifications were found for leukemia, central nervous system, melanoma, uterine, and lymphoma tumor types, with very poor classifications found for colorectal, ovarian, breast, and lung tumors. Classification results were further analyzed to identify sets of differentially expressed genes between tumor and normal gene expressions and among each tumor class. Within the total pool of 1139 genes most differentially expressed in this dataset, subsets were found that could be vetted according to previously published literature sources to be specific tumor markers. Attempts to classify gene expression datasets from other sources found a wide range of classification accuracies. Discussions about the utility of this method and the quality of data needed for accurate tumor classifications are provided.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Classification and Biomarker Genes Selection for Cancer Gene Expression Data Using Random Forest

Background & objective: Microarray and next generation sequencing (NGS) data are the important sources to find helpful molecular patterns. Also, the great number of gene expression data increases the challenge of how to identify the biomarkers associated with cancer. The random forest (RF) is used to effectively analyze the problems of large-p and smal...

متن کامل

Feature Selection and Classification of Microarray Gene Expression Data of Ovarian Carcinoma Patients using Weighted Voting Support Vector Machine

We can reach by DNA microarray gene expression to such wealth of information with thousands of variables (genes). Analysis of this information can show genetic reasons of disease and tumor differences. In this study we try to reduce high-dimensional data by statistical method to select valuable genes with high impact as biomarkers and then classify ovarian tumor based on gene expression data of...

متن کامل

Mining Microarray Gene Expression Data

DNA microarray technology provides biologists with the ability to measure the expression levels of thousands of genes in a single experiment. Many initial experiments suggest that genes of similar function yield similar expression patterns in microarray hybridization experiments. Hence it makes possible to distinguish genes with different functions by analyzing gene data generated from microarr...

متن کامل

An unsupervised hierarchical dynamic self-organizing approach to cancer class discovery and marker gene identification in microarray data

MOTIVATION Current Self-Organizing Maps (SOMs) approaches to gene expression pattern clustering require the user to predefine the number of clusters likely to be expected. Hierarchical clustering methods used in this area do not provide unique partitioning of data. We describe an unsupervised dynamic hierarchical self-organizing approach, which suggests an appropriate number of clusters, to per...

متن کامل

Expression Profiling of Microarray Gene Signatures in Acute and Chronic Myeloid Leukaemia in Human Bone Marrow

Background Classification of cancer subtypes by means of microarray signatures is becoming increasingly difficult to ignore as a potential to transform pathological diagnosis nonetheless, measurement of Indicator genes in routine practice appears to be arduous. In a preceding published study, we utilized real-time PCR measurement of Indicator genes in acute lymphoid leukaemia (ALL) and acute m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Molecular cancer therapeutics

دوره 2 3  شماره 

صفحات  -

تاریخ انتشار 2003